Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction

نویسندگان

  • Huanjing Wang
  • Taghi M. Khoshgoftaar
  • Amri Napolitano
چکیده

Software metrics and fault data are collected during the software development cycle. A typical software defect prediction model is trained using this collected data. Therefore the quality and characteristics of the underlying software metrics play an important role in the efficacy of the prediction model. However, superfluous software metrics often exist. Identifying a small subset of metrics becomes an essential task before building defect prediction models. Wrapper-based feature (software metric) subset selection uses a classifier to discover which feature subsets are most useful. To the best of our knowledge, no previous work has examined how the choice of performance metric within wrapper-based feature selection will affect classification performance. In this paper, we used five wrapper-based feature selection methods to remove irrelevant and redundant features. These five wrappers vary based on the choice of performance metric (Overall Accuracy (OA), Area Under ROC (Receiver Operating Characteristic) Curve (AUC), Area Under the Precision-Recall Curve (PRC), Best Geometric Mean (BGM), and Best Arithmetic Mean (BAM)) used in the model evaluation process. The models are trained using the logistic regression learner both inside and outside wrappers. The case study is based on software metrics and defect data collected from a real world software project. The results demonstrate that BAM is the best performance metric used within the wrapper. Moreover, comparing to models built with full datasets, the performances of defect prediction models can be improved when metric subsets are selected through a wrapper subset selector.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stability of Three Forms of Feature Selection Methods on Software Engineering Data

One of the major challenges when working with software metrics datasets is that some metrics may be redundant or irrelevant to software defect prediction. This may be addressed using feature (metric) selection, which chooses an appropriate subset of features for use in downstream computation. There are three major forms of feature selection: filter-based feature rankers, which uses statistical ...

متن کامل

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...

متن کامل

A Novel Approach for Improving Software Quality Prediction

247 Published By: Blue Eyes Intelligence Engineering & Sciences Publication Pvt. Ltd. Abstract—Software quality prediction is a process of utilizing software metrics such as code-level measurements and defect data to build classification models that are able to estimate the quality of program modules. These kinds of estimations can help software managers to effectively allocate potentially limi...

متن کامل

Evaluation of Classifiers in Software Fault-Proneness Prediction

Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one ...

متن کامل

Choosing software metrics for defect prediction: an investigation on feature selection techniques

The selection of software metrics for building software quality prediction models is a search-based software engineering problem. An exhaustive search for such metrics is usually not feasible due to limited project resources, especially if the number of available metrics is large. Defect prediction models are necessary in aiding project managers for better utilizing valuable project resources f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014